A State of the Art of Word Sense Induction: A Way Towards Word Sense Disambiguation for Under-Resourced Languages

نویسنده

  • Mohammad Nasiruddin
چکیده

______________________________________________________________________________________________ Word Sense Disambiguation (WSD), the process of automatically identifying the meaning of a polysemous word in a sentence, is a fundamental task in Natural Language Processing (NLP). Progress in this approach to WSD opens up many promising developments in the field of NLP and its applications. Indeed, improvement over current performance levels could allow us to take a first step towards natural language understanding. Due to the lack of lexical resources it is sometimes difficult to perform WSD for under-resourced languages. This paper is an investigation on how to initiate research in WSD for under-resourced languages by applying Word Sense Induction (WSI) and suggests some interesting topics to focus on. RÉSUMÉ _________________________________________________________________________________________________ État de l'art de l'induction de sens: une voie vers la désambiguïsation lexicale pour les langues peu dotées La désambiguïsation lexicale, le processus qui consiste à automatiquement identifier le ou les sens possible d'un mot polysémique dans un contexte donné, est une tâche fondamentale pour le Traitement Automatique des Langues (TAL). Le développement et l'amélioration des techniques de désambiguïsation lexicale ouvrent de nombreuses perspectives prometteuses pour le TAL. En effet, cela pourrait conduire à un changement paradigmatique en permettant de réaliser un premier pas vers la compréhension des langues naturelles. En raison du manque de ressources langagières, il est parfois difficile d'appliquer des techniques de désambiguïsation à des langues peu dotées. C'est pourquoi, nous nous intéressons ici, à enquêter sur comment avoir un début de recherche sur la désambiguïsation lexicale pour les langues peu dotées, en particulier en exploitant des techniques d'induction des sens de mots, ainsi que quelques suggestions de pistes intéressantes à explorer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Huge Automatically Extracted Training Sets for Multilingual Word Sense Disambiguation

We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences . Experiments prove that these corpora can be effectively used as training sets for supe...

متن کامل

Word Sense Induction for Better Lexical Choice

Most words in natural languages are polysemous in nature that is they have multiple possible meanings or senses. The sense in which the word is used determines the translation of the word. We show that incorporating a sense-based translation model into statistical machine translation model consistently improves translation quality across all different test sets of five different language-pairs,...

متن کامل

Noun Sense Induction and Disambiguation using Graph-Based Distributional Semantics

We introduce an approach to word sense induction and disambiguation. The method is unsupervised and knowledge-free: sense representations are learned from distributional evidence and subsequently used to disambiguate word instances in context. These sense representations are obtained by clustering dependency-based secondorder similarity networks. We then add features for disambiguation from het...

متن کامل

Word Sense Induction and Disambiguation Rivaling Supervised Methods

Word Sense Disambiguation (WSD) aims to determine the meaning of a word in context and successful approaches are known to benefit many applications in Natural Language Processing. Although, supervised learning has been shown to provide superior WSD performance, current sense-annotated corpora do not contain a sufficient number of instances per word type to train supervised systems for all words...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1310.1425  شماره 

صفحات  -

تاریخ انتشار 2013